The sound of music has changed as society has changed over the years, thus the change in the sound of music is really a reflection of our cultural evolution. Modern society has been mixed up by diverse cultures, unlike old days. This is reflected in the changing sound of the music. Then, what about the lyrics? Have words used in lyrics also have changed much over time? Since things have become more diverse, it feels like lyrics might have become more complicated. Let’s find it out by analyzing the lyrics data.
First, we will see whether number of words in lyrics has changed over time. If the number of words has increased over time, it would be reasonable to say that the songs actually have become more complicated.
We can see that the qauntiles of number of words have not changed much. However, the number of outliers increases as time flows. This can be just because there are more data in 2000s and 2010s. We need more analysis to find out the reason.
As we can see from the above plot, “Rock”, “Pop”, and “Jazz” were three dominant genre in 1970s and 1980s. But percentage of “Hip-Hop” begins to increase in 1990; so does other genres, and percentage of “Jazz” and “Rock” begin to decrease. Usually, “Hip-Hop” has more words in its lyrics than other music genre. Increase in percentage in “Hip-Hop” genre over time may be a reason for the above outliers.
This is a Dot plot showing number of each songs grouped by genre and time lines. As we expected, most of the songs that use more number of words are Hip-Hop and modern Rock music.
Songs that use more number of words clearly appeared more in recent years. However, as we saw in the above boxplot, quantiles haven’t changed much, and it means number of words has not changed generally. So it’s hard to say lyrics of modern songs have become more complicated based on this single analysis. We need to do more.
Now let’s see whether popular words for lyrics have changed over time. From now on, I will divide the time line into two pieces, ‘Before 2000’ and ‘After 2000’, representing old time and modern time respectively.
When we look at the word cloud before 2000, the most frequently used word is “love”. And most of the words look very positive.
When we look at the word cloud after 2000, the most frequently used word is ALSO “love”. But different thing from previous word cloud is that we can see more negative words, like “die”, “tear”, “wrong”, “die”, and even some swearing. And the word, “world” is being used much more often than before.
| word | Freq | |
|---|---|---|
| 9837 | love | 12126 |
| 17036 | time | 5628 |
| 1124 | baby | 5224 |
| 19227 | youre | 4895 |
| 8277 | ill | 3694 |
| 4198 | day | 3543 |
| 11279 | night | 3497 |
| 7624 | heart | 3077 |
| 9563 | life | 3015 |
| 8672 | ive | 3009 |
| word | Freq | |
|---|---|---|
| 53162 | love | 181926 |
| 91242 | time | 107374 |
| 103530 | youre | 95775 |
| 6177 | baby | 72401 |
| 44463 | ill | 68221 |
| 21983 | day | 62033 |
| 51667 | life | 59782 |
| 40701 | heart | 55775 |
| 46470 | ive | 54275 |
| 61429 | night | 51518 |
However, top 10 popular words are almost the same. Perhaps, it’s because most songs are about love; both past and present.
From above analysis, we could find out that there has been a mere few changes in popular words and negative words began to show up more in songs after year 2000.
In this part, I wanted to know if the readability of lyrics has gotten worse over time. For a readability measure, I used “Bormuth.MC”, “Coleman”, “Dale.Chall”, and “Flesch”. Higher a readability score, easier to read in all measures.
(Darker part represents the higher score) From above heatmap, we can clearly see the readability score of songs are decreasing over time. Which means, it’s getting harder to read the lyrics. (Only “Coleman” measure shows different result. It’s probably because they are using different formula.) The readability scores of lyrics written in 1970s are especially high. Songs in the old days were easier to read, and probably therefore easier to understand while listening. Then what would be the reason for this decline in readability score over time? To find out, I analyzed the readability score for each genre.
When we did the analysis about percentage of genre over time, we could see the biggest changes in percentage of “Jazz”, “Rock”, and “Hip-Hop” music. “Jazz” and “Rock” show a moderate readability score. Hence, it may not have affected much on readability score over time. However, “Hip-Hop” shows a little low readability score. Hence, increase in “Hip-Hop” music might have caused the decrease in readability score of lyrics of modern songs. So far, it seems quite rational to say songs made in recent years actually use more complicated lyrics; more number of words and harder to read.
From above Topic analysis, we could see more negative words in the word cloud of lyrics written after year 2000. But, we do not have any numerical value about it. So I did sentimental analysis to get it. This sentimental analysis scores how much the word is positive or negative. By getting an average value for the words in the same time list, we would be able to compare two part of time lists; before 2000 and after 2000.
| genre | mean(sentiment) |
|---|---|
| Country | -2.76923076923077 |
| Electronic | -0.857142857142857 |
| Folk | -1.48648648648649 |
| Hip-Hop | -9.38721804511278 |
| Jazz | 2.90704225352113 |
| Metal | -9.08943089430894 |
| Other | 4.04347826086957 |
| Pop | 1.48042704626335 |
| R&B | 0.996753246753247 |
| Rock | -2.80901922129128 |
| Mean | -1.69708274661658 |
In the Sentimental Analysis of lyrics written before year 2000, “Hip-Hop” and “Metal” are two genres that show the most negative sentiment, and “Other” and “Jazz” are two genres that show the most positive sentiment. And other genres show sentiment score between (-3, 3). And total average of the sentiment score is -1.70, which is little negative.
| genre | mean(sentiment) |
|---|---|
| Country | -0.588978345363687 |
| Electronic | -2.01710376282782 |
| Folk | -2.86582809224319 |
| Hip-Hop | -13.5943999070524 |
| Indie | -2.76325903151422 |
| Jazz | 1.76883883078573 |
| Metal | -9.80942455822383 |
| Other | -5.40322580645161 |
| Pop | -0.382893549842129 |
| R&B | -0.423035522066738 |
| Rock | -3.37077680234105 |
| Mean | -3.58637150428554 |
In the Sentimental Analysis of lyrics written after year 2000, “Hip-Hop” and “Metal” are two genres that show the most negative sentiment, and “Jazz” is the only genre that shows the positive sentiment. Overall, all genres show very negative sentiment. Even “Other”, which gave one of the highest sentiment score in previous analysis, also shows very negative sentiment. Ofcourse, total average of the sentiment score got much lower, -3.59. Therefore, with this analysis, we could get more clear evidence that the lyrics written after year 2000 use more negative words. I think it definitely can make some group of people feel harder to understand while listening.
After doing Exploratory Data Analysis with lyrics data, I could get following results.
Generally, number of words being used in lyrics has not changed over years. But, number of songs that use more words in its lyrics began to increase dramatically in 2000. And it is probably due to the increase of “Hip-Hop” music in modern years.
Most popular words in lyrics haven’t changed much over years as majority of the songs are about “LOVE”. More negative words showed up in word cloud of “After 2000”.
The lyrics are getting harder and harder to read. It’s probably because number of “Hip-Hop” music, which shows low readability score, increased dramatically over time.
Hence, the lyrics of the songs are actually getting complicated and harder to understand, especially certain group of people may feel even harder.